Skip to content

Conversation

tac0turtle
Copy link
Contributor

@tac0turtle tac0turtle commented Oct 1, 2025

Overview

short term fix to clear cache

it turns out that 2MB is the max tx size and not the max blob size. This means DA is down and pending headers/data increases, when it was reaching 2MB it would constantly fail

Copy link
Contributor

github-actions bot commented Oct 1, 2025

The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedOct 1, 2025, 1:15 PM

@tac0turtle tac0turtle marked this pull request as ready for review October 1, 2025 11:13
Copy link
Contributor

claude bot commented Oct 1, 2025

Pull Request Review: Reduce Max Blob Size from 2MB to 1.5MB

Summary

This PR reduces the defaultMaxBlobSize constant from 2MB to 1.5MB as a short-term fix to clear cache issues. While the change itself is minimal, there are several considerations to address.


✅ Code Quality & Best Practices

Positive:

  • The change is minimal and focused
  • The constant is appropriately named and documented
  • The location (fallback blob size limit) is correct

Concerns:

  1. Magic Number: The 1.5 multiplier is a magic number. Consider defining it as a constant:

    const defaultMaxBlobSizeMB = 1.5
    defaultMaxBlobSize = defaultMaxBlobSizeMB * 1024 * 1024
  2. Lack of Context: The PR description mentions "short term fix to clear cache" but:

    • What cache issue is this solving?
    • Why is 1.5MB the right threshold?
    • What is the long-term solution?
  3. DNM Flag: The PR is marked "Do Not Merge" - this should not be merged without clarification on the intended next steps.


🐛 Potential Issues

  1. Breaking Change Risk: Reducing the max blob size could cause issues for:

    • Existing deployments expecting 2MB capacity
    • Large transaction batches that previously fit
    • Integration tests that may be tuned to the 2MB limit
  2. Celestia Limits: Based on the codebase documentation, Celestia has specific blob size limits. The code mentions 1974272 bytes (~1.88MB) in several places. The relationship between:

    • This fallback limit (1.5MB)
    • Actual DA layer limits
    • Protocol overhead buffers

    Should be clearly documented or validated.

  3. Test Coverage Impact: The tests use DummyDA with maxBlobSize: 10_000_000 (10MB), which is significantly larger than the production default. This means:

    • Tests won't catch size-related issues at the 1.5MB boundary
    • Consider adding a test with realistic blob size limits

⚡ Performance Considerations

Positive:

  • Smaller blobs mean faster transmission
  • Reduced memory pressure per blob
  • May improve DA layer acceptance rates

Concerns:

  1. Increased Batch Count: With a lower limit, more batches may be needed:

    • More DA submissions = more gas costs
    • More network round trips
    • Increased retry logic invocations
  2. Batch Splitting Overhead: The limitBatchBySize function will trigger more frequently, causing:

    • Additional marshaling operations
    • More complexity in the retry state machine
    • Potential for more "StatusTooBig" responses requiring halving logic (lines 385-399)
  3. Missing Compression: Issue Add Compression to Blobs Before Submission #2532 proposes blob compression which could:

    • Allow fitting more data within size limits
    • Be a better long-term solution than reducing limits
    • Provide 20-40% compression ratios with zstd

🔒 Security Considerations

Low Risk Change: This is primarily a resource limit adjustment with minimal security implications.

Considerations:

  1. DoS Surface: Smaller blobs could mean more submissions to achieve the same throughput, potentially increasing attack surface for DA layer spam
  2. Data Availability: Ensure smaller blobs don't fragment data in ways that impact sync performance for new nodes

🧪 Test Coverage

Current State:

  • Unit tests exist but use unrealistically large blob limits (10MB)
  • Integration tests may not exercise the edge cases at 1.5MB

Recommendations:

  1. Add a test case with realistic blob size limits:

    func TestDASubmitter_SubmitData_AtSizeLimit(t *testing.T) {
        // Test with 1.5MB limit to ensure proper batch splitting
        dummyDA := coreda.NewDummyDA(1.5*1024*1024, 0, 0, 10*time.Millisecond)
        // ... create data that approaches/exceeds limit
    }
  2. Verify the limitBatchBySize function works correctly at the new threshold

  3. Test the halving logic in the retry mechanism (lines 385-399) with 1.5MB blobs


📋 Recommendations

Before Merging:

  1. Clarify Intent:

    • Remove DNM flag or explain why this should remain open
    • Document the specific cache issue being addressed
    • Provide data on why 1.5MB was chosen
  2. Add Documentation:

    • Update any documentation mentioning 2MB limits
    • Add comments explaining the relationship to Celestia's ~1.88MB limit
    • Document the protocol overhead budget
  3. Consider Alternatives:

    • Is compression (PR feat: implement blob compression #2547) a better solution?
    • Should this be configurable rather than a hard-coded constant?
    • What about dynamic adjustment based on DA layer feedback?
  4. Add Tests:

    • Test with realistic blob size limits
    • Verify batch splitting behavior at 1.5MB boundary
    • Test edge case where single item approaches 1.5MB
  5. Validate Impact:

    • Run integration tests against Celestia testnet
    • Measure increase in DA submission frequency
    • Calculate gas cost implications

Long-term Considerations:

  • Make blob size limit configurable per DA layer
  • Implement compression to maximize space utilization
  • Add metrics for blob size distribution and batch splitting frequency
  • Consider dynamic blob size adjustment based on DA layer conditions

Verdict

Status: ⚠️ Changes Requested

This change is too simplistic for a complex issue. The DNM flag suggests uncertainty, and the lack of context makes it difficult to assess correctness. Before merging:

  1. Provide clear justification with data
  2. Add appropriate test coverage
  3. Document the relationship to DA layer limits
  4. Consider if compression is a better solution
  5. Remove DNM or close the PR

Code Quality: 6/10 - Clean change but lacks context and testing
Risk Level: Medium - Could impact production deployments
Test Coverage: 4/10 - Existing tests don't exercise realistic limits


Review conducted following project conventions from CLAUDE.md. Generated with Claude Code

Copy link
Contributor

claude bot commented Oct 1, 2025

Code Review: Reduce Max Blob Size

Overview

This PR reduces the defaultMaxBlobSize constant from 2MB to 1.5MB as a short-term fix to clear cache.

Findings

✅ Positive Aspects

  1. Minimal, focused change - Single constant modification reduces risk
  2. Clear documentation - Comment explains this is a fallback limit
  3. Proper constant usage - Used through retryPolicy.MaxBlobBytes allowing easy adjustment

⚠️ Concerns & Recommendations

1. Missing Context (Critical)
  • Issue: The PR description mentions "short term fix to clear cache" but doesn't explain:
    • Why is cache clearing needed?
    • How does reducing blob size address the cache issue?
    • What's the long-term solution?
    • What triggered this need?
  • Recommendation: Add more context to the PR description explaining the root cause and long-term plan
2. Test Coverage Gap (Important)
  • Issue: The test file da_retry_test.go:17 hardcodes MaxBlobBytes: 2 * 1024 * 1024
  • Impact: Test now uses a different value than production code
  • Location: block/internal/submitting/da_retry_test.go:17
  • Recommendation: Update the test to use defaultMaxBlobSize constant or document why the test uses a different value
3. Potential Impact Analysis (Medium)
  • Behavior change: Blobs between 1.5MB-2MB will now be rejected earlier in the submission pipeline
  • Risk: Could cause more frequent batch splitting in limitBatchBySize() (line 418-441)
  • Recommendation: Monitor DA submission metrics after deployment, especially:
    • Frequency of "batch too big" errors (line 398)
    • Average submission batch sizes
    • Single item rejection rates
4. Configuration Flexibility (Low Priority)
  • Observation: This is a fallback constant when DA layer doesn't provide limits
  • Current design: Hardcoded constant vs. configurable parameter
  • Question: Should this be a config option instead, given it's being tuned for operational reasons?

Security & Performance

  • ✅ No security concerns identified
  • ✅ Performance impact is neutral-to-positive (smaller blobs may reduce memory pressure)
  • ✅ Proper error handling exists for oversized blobs

Style & Conventions

  • ✅ Follows project conventions per CLAUDE.md
  • ✅ Maintains code formatting standards

Recommended Actions

  1. Before merging: Update PR description with root cause analysis and long-term plan
  2. Before merging: Fix test inconsistency at da_retry_test.go:17
  3. After merging: Monitor DA submission metrics for unexpected behavior
  4. Follow-up: Consider whether this should be configurable rather than hardcoded

Overall Assessment

The code change itself is safe and well-implemented, but lacks sufficient context for reviewers to assess whether this is the right solution to the underlying problem.


🤖 Review generated by Claude Code

Copy link

codecov bot commented Oct 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.14%. Comparing base (a3b4b0d) to head (85d36b1).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2720   +/-   ##
=======================================
  Coverage   66.14%   66.14%           
=======================================
  Files          77       77           
  Lines        7871     7871           
=======================================
  Hits         5206     5206           
  Misses       2179     2179           
  Partials      486      486           
Flag Coverage Δ
combined 66.14% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

alpe
alpe previously approved these changes Oct 1, 2025
julienrbrt
julienrbrt previously approved these changes Oct 1, 2025
@julienrbrt julienrbrt changed the title chore: reduce max size (DNM) refactor(block): reduce max size in da submitter Oct 1, 2025
defaultGasPrice = 0.0
defaultGasMultiplier = 1.0
defaultMaxBlobSize = 2 * 1024 * 1024 // 2MB fallback blob size limit
defaultMaxBlobSize = 1.5 * 1024 * 1024 // 1.5MB fallback blob size limit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we merge this, we need to decrease it here too: https://github.com/evstack/ev-node/blob/main/da/jsonrpc/internal/consts.go#L20

@tac0turtle tac0turtle dismissed stale reviews from julienrbrt and alpe via 10621a0 October 1, 2025 12:58
Copy link
Contributor

claude bot commented Oct 1, 2025

Pull Request Review

I've reviewed PR #2720 which reduces the maximum blob size from 2MB to 1.5MB. Here's my detailed feedback:

Overview

This PR addresses a critical production issue where DA submissions were consistently failing because the code was incorrectly using the max transaction size (2MB) as the max blob size limit. The fix appropriately reduces the limit to 1.5MB as a short-term solution to restore DA functionality.


Positive Aspects

  1. Critical Bug Fix: Correctly identifies and fixes the root cause - the 2MB limit was for transactions, not blobs
  2. Consistent Application: Changes are applied in both locations (DA submitter and JSON-RPC client) ensuring consistency
  3. Good Cleanup: Removes the obsolete consts.go file and associated CI workflow that was no longer needed
  4. Preserves Test Coverage: Existing tests cover size validation logic in both client_test.go and da_submitter_test.go

⚠️ Issues & Concerns

1. Magic Number - Code Quality Issue (Priority: Medium)

Location: block/internal/submitting/da_submitter.go:28 and da/jsonrpc/client.go:18

The value 1.5 * 1024 * 1024 is now duplicated in two places without explanation of why 1.5MB specifically. This violates DRY principles.

Recommendations:

  • Add a comment explaining why 1.5MB (e.g., "1.5MB provides safety margin below 2MB tx limit")
  • Consider defining this in a single shared location (e.g., core/da/constants.go) to prevent drift
  • Document whether this is temporary or permanent

2. Incomplete Test Coverage (Priority: Medium)

Location: block/internal/submitting/da_retry_test.go:17

The test still uses the old 2MB value:

MaxBlobBytes:     2 * 1024 * 1024,

This test won't catch regressions if the limit is accidentally increased back to 2MB.

Recommendation: Update this test to use the new 1.5MB limit or reference the constant directly.

3. Missing Documentation (Priority: Low)

The PR description mentions this is a "short term fix" but doesn't indicate:

  • What the long-term solution should be
  • Whether there's a follow-up issue tracked
  • How to determine the actual max blob size from the DA layer dynamically

Recommendation: Add a TODO comment or link to a follow-up issue for the proper fix.

4. Potential Type Conversion Issue (Priority: Low)

Location: da/jsonrpc/client.go:239

client.DA.MaxBlobSize = uint64(DefaultMaxBlobSize)

Converting from float64 constant to uint64 could introduce precision issues, though in practice 1.5MB converts cleanly to 1,572,864 bytes.

Recommendation: Define the constant as int or uint64 directly to avoid float arithmetic:

const DefaultMaxBlobSize uint64 = 1572864 // 1.5MB

🔒 Security Considerations

✅ No security concerns identified. The change reduces the size limit, which is inherently safer.


Performance Considerations

Neutral Impact: Reducing the limit means:

  • ✅ Fewer submission failures (positive)
  • ⚠️ More frequent submissions for same data volume (minor negative)
  • ✅ Faster retry loops due to smaller batches (positive)

The performance impact appears acceptable given this fixes a blocking production issue.


🧪 Testing Recommendations

  1. Update da_retry_test.go to use 1.5MB limit
  2. Add Integration Test: Verify submission succeeds with blobs approaching 1.5MB
  3. Regression Test: Ensure blobs slightly over 1.5MB properly trigger StatusTooBig and retry with smaller batches

📋 Action Items

Before Merge:

  • Update da_retry_test.go:17 to use 1.5MB
  • Add explanatory comment about why 1.5MB specifically
  • Consider using integer constant instead of float

Follow-up (can be separate PR):

  • Create issue to dynamically query actual max blob size from DA layer
  • Consider consolidating blob size constants in a single location
  • Add integration test for near-limit blob sizes

🎯 Verdict

LGTM with minor changes requested

This is a necessary and well-executed fix for a production issue. The concerns raised are primarily about code quality and maintainability rather than correctness. The fix is safe to merge after addressing the test update in da_retry_test.go.

Estimated Merge Readiness: 90% - just needs the test update and a clarifying comment.


Review performed following ev-node coding standards from CLAUDE.md

julienrbrt
julienrbrt previously approved these changes Oct 1, 2025
Copy link
Contributor

claude bot commented Oct 1, 2025

Code Review: Reduce Max Blob Size from 2MB to 1.5MB

Thank you for this important fix! I've reviewed the PR and have the following feedback:

✅ Positive Aspects

  1. Clear Problem Statement: The PR correctly identifies that 2MB is the max transaction size, not the max blob size, and addresses the DA submission failures.

  2. Consistent Application: The change is consistently applied across:

    • defaultMaxBlobSize constant in da_submitter.go:28
    • DefaultMaxBlobSize constant in pkg/cmd/run_node.go:29
    • All client instantiation points in run commands
    • Test file proxy_test.go:33
  3. Good Parameterization: Making maxBlobSize a parameter to NewClient() is the right architectural choice, enabling configurability.

  4. Appropriate Cleanup: Removing the da/jsonrpc/internal/consts.go file and the drift-checking workflow makes sense since these constants are no longer needed.

⚠️ Issues and Concerns

1. Test Coverage Gap - Critical

The test file block/internal/submitting/da_retry_test.go:17 still uses the old 2MB value:
```go
MaxBlobBytes: 2 * 1024 * 1024,
```

Recommendation: Update this test to use 1.5 * 1024 * 1024 for consistency. Even though it's just a test, having mismatched values could cause confusion and may not properly validate the new behavior.

Location: /home/runner/work/ev-node/ev-node/block/internal/submitting/da_retry_test.go:17

2. Missing Rationale for 1.5MB - Important

The PR doesn't explain why 1.5MB was chosen as the new limit. Questions:

  • Is this based on empirical data showing safe headroom?
  • Does this account for transaction overhead, encoding, or wrapping?
  • What's the relationship between the 2MB tx limit and the 1.5MB blob limit?

Recommendation: Add a comment in the code or PR description explaining the 1.5MB calculation. This helps future maintainers understand the safety margin.

3. Hardcoded Constant - Minor

The value is hardcoded in multiple places as 1.5 * 1024 * 1024. While this is consistent, it creates a maintenance burden if the value needs to change again.

Recommendation: Consider consolidating to a single source of truth:
```go
// In pkg/cmd/run_node.go or a shared constants file
const DefaultMaxBlobSize = 1.5 * 1024 * 1024 // 1.5MB - safe margin under 2MB tx limit
```
Then import and reuse this constant in da_submitter.go instead of redefining it.

4. Workflow Deletion - Needs Clarification

Removing .github/workflows/check_consts_drift.yml makes sense if you're no longer tracking Celestia app constants, but:

  • Was this workflow providing value for other use cases?
  • Should there be a new mechanism to prevent the blob size from accidentally increasing?

Recommendation: Consider if CI checks are needed to prevent regression (e.g., ensuring DefaultMaxBlobSize stays below some threshold).

5. Missing Integration Test Evidence

The PR description mentions "it turns out that 2MB is the max tx size and not the max blob size. This means DA is down and pending headers/data increases, when it was reaching 2MB it would constantly fail."

Recommendation:

  • Add a test case that validates behavior near the 1.5MB boundary
  • Run integration tests (make test-integration) to ensure this fixes the DA submission failures
  • Consider adding an E2E test that submits blobs close to the limit

🔒 Security Considerations

✅ No security concerns identified. The change reduces the allowed blob size, which is conservative and safe.

⚡ Performance Considerations

  • Positive: Smaller blobs may reduce memory pressure and network overhead
  • Potential Issue: If this limit is too conservative, it could increase the number of DA submissions needed for the same data, potentially increasing costs
  • Recommendation: Monitor DA submission frequency and costs after deployment to ensure 1.5MB provides adequate throughput

📝 Code Quality & Best Practices

Following the CLAUDE.md guidelines:

  1. Go Conventions: Code follows standard Go formatting
  2. Error Handling: No changes to error handling, existing patterns maintained
  3. Minimal Changes: The PR is focused and doesn't introduce unnecessary changes
  4. ⚠️ Documentation: Could benefit from inline comments explaining the 1.5MB choice

🧪 Testing Recommendations

Before merging:
```bash

Run unit tests

make test

Run integration tests (15m timeout)

make test-integration

Run with the specific DA submitter tests

go test -v ./block/internal/submitting/...

Test the jsonrpc client

go test -v ./da/jsonrpc/...
```

📋 Suggested Changes

  1. Update test constant (block/internal/submitting/da_retry_test.go:17):
    ```go
    MaxBlobBytes: 1.5 * 1024 * 1024, // Update to match new default
    ```

  2. Add explanatory comment (block/internal/submitting/da_submitter.go:28):
    ```go
    // defaultMaxBlobSize is set to 1.5MB to provide safe headroom under the 2MB
    // transaction size limit, accounting for transaction overhead and encoding
    defaultMaxBlobSize = 1.5 * 1024 * 1024
    ```

  3. Consider consolidating constants to reduce duplication between pkg/cmd/run_node.go and block/internal/submitting/da_submitter.go

Summary

This is a necessary fix that addresses a real production issue. The code changes are sound and consistently applied. The main concerns are:

  • Critical: Update the test in da_retry_test.go to use 1.5MB
  • Important: Document why 1.5MB was chosen
  • Nice-to-have: Consider constant consolidation and additional test coverage

Overall: Approve with suggested changes

The PR is on the right track, but addressing the test inconsistency and adding documentation would significantly improve code quality and maintainability.

@tac0turtle tac0turtle requested review from alpe and julienrbrt October 1, 2025 13:19
@tac0turtle tac0turtle added this pull request to the merge queue Oct 1, 2025
Merged via the queue into main with commit bc6a0b1 Oct 1, 2025
28 checks passed
@tac0turtle tac0turtle deleted the marko/reduce_max branch October 1, 2025 13:36
@github-project-automation github-project-automation bot moved this to Done in Evolve Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants